Search CORE

International Migration, Integration and Social Cohesion online publications

Explore Bristol Research

Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees

Author: Fokkema M.
Hothorn T.
Kelderman H.
Smits N.
Zeileis A.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2018
Field of study

Spike-and-Slab Priors for Function Selection in Structured Additive Regression Models

Author: Fabian Scheipl
Fahrmeir L.
Hothorn T.
Lewis B.
Ludwig Fahrmeir
Polson N.
Sabanés Bové D.
Scheipl F.
Scheipl F.
Scheipl F.
Thomas Kneib
Publication venue: 'Informa UK Limited'
Publication date: 02/12/2011
Field of study

Structured additive regression provides a general framework for complex Gaussian and non-Gaussian regression models, with predictors comprising arbitrary combinations of nonlinear functions and surfaces, spatial effects, varying coefficients, random effects and further regression terms. The large flexibility of structured additive regression makes function selection a challenging and important task, aiming at (1) selecting the relevant covariates, (2) choosing an appropriate and parsimonious representation of the impact of covariates on the predictor and (3) determining the required interactions. We propose a spike-and-slab prior structure for function selection that allows to include or exclude single coefficients as well as blocks of coefficients representing specific model terms. A novel multiplicative parameter expansion is required to obtain good mixing and convergence properties in a Markov chain Monte Carlo simulation approach and is shown to induce desirable shrinkage properties. In simulation studies and with (real) benchmark classification data, we investigate sensitivity to hyperparameter settings and compare performance to competitors. The flexibility and applicability of our approach are demonstrated in an additive piecewise exponential model with time-varying effects for right-censored survival times of intensive care patients with sepsis. Geoadditive and additive mixed logit model applications are discussed in an extensive appendix

arXiv.org e-Print Archive

University of Essex Research Repository

Ensemble of a subset of kNN classifiers

Author: A Karatzoglou
Aris Perperoglou
Asma Gul
Berthold Lausen
C Müssel
D Mease
DF Nettleton
E Bauer
EW Steyerberg
J Hernández-Orallo
J Kruppa
L Breiman
L Lausser
Miftahuddin Miftahuddin
O Mahmoud
Osama Mahmoud
P Hall
P Melville
R Barandela
R Maclin
RJ Samworth
S Li
T Cover
T Hothorn
T Hothorn
T Hothorn
T Hothorn
T Khoshgoftaar
Werner Adler
Z Liu
Zardad Khan
ZH Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Combining multiple classifiers, known as ensemble methods, can give substantial improvement in prediction performance of learning algorithms especially in the presence of non-informative features in the data sets. We propose an ensemble of subset of kNN classifiers, ESkNN, for classification task in two steps. Firstly, we choose classifiers based upon their individual performance using the out-of-sample accuracy. The selected classifiers are then combined sequentially starting from the best model and assessed for collective performance on a validation data set. We use bench mark data sets with their original and some added non-informative features for the evaluation of our method. The results are compared with usual kNN, bagged kNN, random kNN, multiple feature subset method, random forest and support vector machines. Our experimental comparisons on benchmark classification problems and simulated data sets reveal that the proposed ensemble gives better classification performance than the usual kNN and its ensembles, and performs comparable to random forest and support vector machines

Leiden University Scholary Publications

Explore Bristol Research

Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees

Author: Fokkema M.
Kelderman H.
Smits N.
Zeileis A. Hothorn T.
Publication venue
Publication date: 01/01/2018
Field of study

Identification of subgroups of patients for whom treatment A is more effective than treatment B, and vice versa, is of key importance to the development of personalized medicine. Tree-based algorithms are helpful tools for the detection of such interactions, but none of the available algorithms allow for taking into account clustered or nested dataset structures, which are particularly common in psychological research. Therefore, we propose the generalized linear mixed-effects model tree (GLMM tree) algorithm, which allows for the detection of treatment-subgroup interactions, while accounting for the clustered structure of a dataset. The algorithm uses model-based recursive partitioning to detect treatment-subgroup interactions, and a GLMM to estimate the random-effects parameters. In a simulation study, GLMM trees show higher accuracy in recovering treatment-subgroup interactions, higher predictive accuracy, and lower type II error rates than linear-model-based recursive partitioning and mixed-effects regression trees. Also, GLMM trees show somewhat higher predictive accuracy than linear mixed-effects models with pre-specified interaction effects, on average. We illustrate the application of GLMM trees on an individual patient-level data meta-analysis on treatments for depression. We conclude that GLMM trees are a promising exploratory tool for the detection of treatment-subgroup interactions in clustered datasets.Article / Letter to editorInstituut Psychologi

An AUC-based Permutation Variable Importance Measure for Random Forests

Author: A Estabrooks
AL Boulesteix
AL Boulesteix
Anne-Laure Boulesteix
C Chen
C Liu
C Strobl
Carolin Strobl
F Briggs
G Batista
J Chang
J Van Hulse
J Van Hulse
K Nicodemus
KK Nicodemus
KK Nicodemus
KK Nicodemus
L Breiman
M Calle
M Cummings
M Khalilia
M Kubat
M Pepe
N Japkowicz
R Blagus
Silke Janitza
T Fawcett
T Hothorn
T Hothorn
T Khoshgoftaar
WJ Lin
Y Huang
Y Sun
Y Xie
Publication venue
Publication date: 01/11/2012
Field of study

The random forest (RF) method is a commonly used tool for classification with high dimensional data as well as for ranking candidate predictors based on the so-called random forest variable importance measures (VIMs). However the classification performance of RF is known to be suboptimal in case of strongly unbalanced data, i.e. data where response class sizes differ considerably. Suggestions were made to obtain better classification performance based either on sampling procedures or on cost sensitivity analyses. However to our knowledge the performance of the VIMs has not yet been examined in the case of unbalanced response classes. In this paper we explore the performance of the permutation VIM for unbalanced data settings and introduce an alternative permutation VIM based on the area under the curve (AUC) that is expected to be more robust towards class imbalance. We investigated the performance of the standard permutation VIM and of our novel AUC-based permutation VIM for different class imbalance levels using simulated data and real data. The results suggest that the standard permutation VIM loses its ability to discriminate between associated predictors and predictors not associated with the response for increasing class imbalance. It is outperformed by our new AUC-based permutation VIM for unbalanced data settings, while the performance of both VIMs is very similar in the case of balanced classes. The new AUC-based VIM is implemented in the R package party for the unbiased RF variant based on conditional inference trees. The codes implementing our study are available from the companion website: http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/070_drittmittel/janitza/index.html

CiteSeerX

Open Access LMU

ZORA

Testing the additional predictive value of high-dimensional molecular data

Author: AL Boulesteix
AL Boulesteix
Anne-Laure Boulesteix
C Truntzer
G Tutz
H Binder
H Höing
J Fridlyand
J Friedman
J Goeman
JJ Goeman
JJ Goeman
LJ van't Veer
M Schmidberger
O Gevaert
P Bühlmann
P Eden
R Tibshirani
R Tibshirani
S Chiaretti
T Golub
T Hothorn
T Hothorn
Torsten Hothorn
X Li
Y Freund
Y Sun
Publication venue: BioMed Central
Publication date: 01/09/2009
Field of study

While high-dimensional molecular data such as microarray gene expression data have been used for disease outcome prediction or diagnosis purposes for about ten years in biomedical research, the question of the additional predictive value of such data given that classical predictors are already available has long been under-considered in the bioinformatics literature. We suggest an intuitive permutation-based testing procedure for assessing the additional predictive value of high-dimensional molecular data. Our method combines two well-known statistical tools: logistic regression and boosting regression. We give clear advice for the choice of the only method parameter (the number of boosting iterations). In simulations, our novel approach is found to have very good power in different settings, e.g. few strong predictors or many weak predictors. For illustrative purpose, it is applied to two publicly available cancer data sets. Our simple and computationally efficient approach can be used to globally assess the additional predictive power of a large number of candidate predictors given that a few clinical covariates or a known prognostic index are already available

Directory of Open Access Journals

Open Access LMU

Prediction intervals for future BMI values of individual children - a non-parametric approach by quantile boosting

Author: A Beyerlein
A Mayr
Andreas Mayr
B Efron
F Sassi
I Jansen
JB Copas
JH Friedman
JJ Reilly
L Breiman
M Dehghan
N Fenske
N Fenske
N Meinshausen
N Meinshausen
Nora Fenske
P Bühlmann
R Development Core Team
R Koenker
R Koenker
R Koenker
R Tibshirani
R Whitaker
RA Rigby
T Hastie
T Hastie
T Hothorn
T Hothorn
T Kneib
Torsten Hothorn
Y Wei
Y Wei
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Background: The construction of prediction intervals (PIs) for future body mass index (BMI) values of individual children based on a recent German birth cohort study with n = 2007 children is problematic for standard parametric approaches, as the BMI distribution in childhood is typically skewed depending on age. Methods: We avoid distributional assumptions by directly modelling the borders of PIs by additive quantile regression, estimated by boosting. We point out the concept of conditional coverage to prove the accuracy of PIs. As conditional coverage can hardly be evaluated in practical applications, we conduct a simulation study before fitting child- and covariate-specific PIs for future BMI values and BMI patterns for the present data. Results: The results of our simulation study suggest that PIs fitted by quantile boosting cover future observations with the predefined coverage probability and outperform the benchmark approach. For the prediction of future BMI values, quantile boosting automatically selects informative covariates and adapts to the age-specific skewness of the BMI distribution. The lengths of the estimated PIs are child-specific and increase, as expected, with the age of the child. Conclusions: Quantile boosting is a promising approach to construct PIs with correct conditional coverage in a non-parametric way. It is in particular suitable for the prediction of BMI patterns depending on covariates, since it provides an interpretable predictor structure, inherent variable selection properties and can even account for longitudinal data structures

Leiden University Scholary Publications

Open Access LMU

Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees

Author: Fokkema M.
Hothorn T.
Kelderman H.
Smits N.
Zeileis A.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/10/2017
Field of study

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Prediction of peptide and protein propensity for amyloid formation

Author: A Quintas
A Trovato
A Trovato
AC Davison
AC Tsolis
Alexandre Quintas
AM Fernandez-Escamilla
AP Pawar
AV Finkelstein
B Rost
C Nerelius
Carlos Família
CM Dobson
D Eisenberg
David A. Phoenix
DJ Selkoe
DM Fowler
Eugene A. Permyakov
F Chiti
F Chiti
F Sasagawa
GG Tartaglia
GG Tartaglia
H Hu
I Cherny
I Walsh
IV Baskakov
J Palau
J Tian
JC Rochet
JD Sipe
JM Zimmerman
JW Kelly
JW Kelly
K Rajagopal
KF DuBay
KK Frousios
KT O’Neil
L Goldschmidt
LO Jimenez
M Belli
M Emily
M Hollander
M Kuhn
M López de la Paz
M Oliveberg
M Stefani
M Sunde
M Sunde
M Zamani
MB Kursa
MJ Thompson
MT Pastor
N Becker
N Qian
O Conchillo-Solé
PK Teng
PY Chou
RS Harrison
S Idicula-thomas
S Kawashima
S Kawashima
S Maurer-Stroh
S Ventura
S Yoon
S Yoon
Sarah R. Dennison
SJ Hamodrakas
SJ Hamodrakas
SK Maji
SO Garbuzynskiy
T Hothorn
T Hothorn
T Hothorn
T Scheibel
TPJ Knowles
VS Mathura
WH DePas
WT Astbury
Y Kallberg
Ž Eva
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 09/07/2014
Field of study

Understanding which peptides and proteins have the potential to undergo amyloid formation and what driving forces are responsible for amyloid-like fiber formation and stabilization remains limited. This is mainly because proteins that can undergo structural changes, which lead to amyloid formation, are quite diverse and share no obvious sequence or structural homology, despite the structural similarity found in the fibrils. To address these issues, a novel approach based on recursive feature selection and feed-forward neural networks was undertaken to identify key features highly correlated with the self-assembly problem. This approach allowed the identification of seven physicochemical and biochemical properties of the amino acids highly associated with the self-assembly of peptides and proteins into amyloid-like fibrils (normalized frequency of β-sheet, normalized frequency of β-sheet from LG, weights for β-sheet at the window position of 1, isoelectric point, atom-based hydrophobic moment, helix termination parameter at position j+1 and ΔGº values for peptides extrapolated in 0 M urea). Moreover, these features enabled the development of a new predictor (available at http://cran.r-project.org/web/packages/appnn/index.html) capable of accurately and reliably predicting the amyloidogenic propensity from the polypeptide sequence alone with a prediction accuracy of 84.9 % against an external validation dataset of sequences with experimental in vitro, evidence of amyloid formation

Public Library of Science (PLOS)

Directory of Open Access Journals